Extendable words in nucleotide sequences

نویسندگان

  • Mikhail S. Gelfand
  • C. G. Kozhukhin
  • Pavel A. Pevzner
چکیده

Previous statistical analyses revealed several peculiarities of nucleotide sequences that preclude their description by existing models and thus allow one to distinguish DNA and RNA sequences from random A,T,G,C-texts. This is a consequence of the unusual distribution of certain words in nucleotide sequences: while the distribution of (most) words is consistent with Markov models of small orders, the distribution of certain words cannot be described by any previous model (anomalies in distribution of homonucleotide/homopurine/homopyrimidine runs, complementary and mirror palindromes, and non-stationary words). In this work we introduce a probabilistic approach that is partly motivated by analogy with linguistics. We also describe another important feature of DNA/RNA sequences: anomalies in distribution of words of poor nucleotide composition. We show that some classes of these words are the major obstacle for the simple Markov description of nucleotide sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences

 Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...

متن کامل

Nucleotide sequence of cDNA encoding for preprochymosin in native goat (Capra hircus) from Iran

Prochymosin is one of the most important aspartic proteinases used as a milk-clotting enzyme in cheese production. In the present investigation we report sequence of cDNA encoding goat ( Capra hircus ) preprochymosin and compare its nucleotide and deduced amino acid sequences with sequences of other ruminants preprochymosin. As bovine prochymosin, the caprine prochymosin cDNA encodes 365 amino ...

متن کامل

Intraspecies Gene Variation within Putative Epitopes of Immunodominant Protein P48 of Mycoplasma agalactiae

P48 protein of Mycoplasma agalactiae is used to diagnose infection and was identified as potential vaccine candidate. According to the genetic nature of mycoplasma and variable sensitivity in P48-based serological diagnosis tests, intra species variation of P48 nucleotide sequence investigated in 13 field isolates of difference province of Iran along with three vaccine strains. Samples were col...

متن کامل

Nucleotide and Amino Acid Changes in HN, F and SH genes of an Iranian Mumps Virus; RS-12, Following Attenuation to Vaccine Strain

Background and Aims: Wild-type RS-12 strain of mumps virus has been isolated from an Iranian patient and has been attenuated after several serial passages. This study was designed to determine nucleotide and amino acid substitutions in the HN, F and SH genes during attenuation of the wild-type virus. Materials and Methods: Required viral samples prepared at Razi Vaccine and Serum Institute. Vi...

متن کامل

Comparative genomics of human stem cell factor (SCF)

Stem cell factor (SCF) is a critical protein with key roles in the cell such as hematopoiesis, gametogenesis and melanogenesis. In the present study a comparative analysis on nucleotide sequences of SCF was performed in Humanoids using bioinformatics tools including NCBI-BLAST, MEGA6, and JBrowse. Our analysis of nucleotide sequences to find closely evolved organisms with high similarity by NCB...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer applications in the biosciences : CABIOS

دوره 8 2  شماره 

صفحات  -

تاریخ انتشار 1992